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ABSTRACT 

This project deals with techniques to solve Markov 
Chains numerically. It acquaints the reader with Markov 
Chains and their applications in the first chapter. Then it 
discusses the classical Gaussian Elimination for Solving 
Markov Chains in the second chapter. Chapter three 
introduces the reader to iterative methods including the 
common Jacobi and Gauss Seidel methods. Convergence of a 
general iterative scheme is also discussed. The problem of 
slow eeateea eae is illustrated by an example. Methods of 
speeding up convergence however are not discussed. The 
fourth chapter is probably the most important chapter of 
the project as it describes relatively recent techniques 
developed to solve linear systems with sparse matrices like 
those found in Markov Chains. These techniques are called 
projection methods. To eis end, a prototype projection 
step is given and two common algorithms, explained. 
Finally, the concluding chapter summarizes and sheds light 
on the advantages and disadvantages of the different 


methods used in this project. 
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CHAPTER ONE 


INTRODUCTION 


Overview 

Linear equations arise in a lot of engineering and 
scientific applications. One of these applications is 
computing the probability distribution of a Markov chain at 
the steady state. A Markov chain is a system whose states 
change such that the probability of the system to be ina 
certain'state depends only on the state prior to it. This 
project aims at identifying the problem and describing how 
the specific structure of the coefficient matrix directed 
mathematicians to look for efficient methods for solving 
Markov chains numerically. A standard notation for a 
general system of linear equations is Ax = b. When it comes 
‘to Markov chains, we usually seek a system of the form 
mP =n, where P is a stochastic matrix and nis the 


probability distribution at the steady state. In other 


words > Put 1 FOr ise Tease gon and |nl,=1. So, the goal is 
allj 


to find the eigenvector that corresponds to the unit 


eigenvalue. 


An eared are of Markov chains is given in the first 
part of this study. Discrete and continuous Markov chains 
are introduced. Then, the steady state equations are 
derived in both cases. The embedded Markov chain is also 
discussed and is illustrated by an example. A queuing model 
as an example of a continuous Markov chain concludes the 
introduction. 

Direct methods are then introduced by explaining a 
common one, Gaussian elimination. Then, some iterative 
methods are introduced such as the power method and Gauss 
Seidel’s method. The study also presents the problem of 
slow convergence rates that arises when the ratio between 
the dominant eigenvalue and the subdominant is 
approximately equal to one. The Courtois matrix is an 
example of a matrix that exhibits this property of slow 
convergence. 

Finally projection methods are discussed. Projection 
methods are relatively recent and have proved efficient in 
solving Markov chains. A general projection scheme is 
presented in this paper along with two methods: Arnoldi’s 
method and the generalized minimal residual’s method, 


GMRES. ‘Both methods require finding an orthonormal basis 


for the Krylov subspace. To this end Arnoldi’s process is 
presented and demonstrated by an example. The strength of 
the GMRES method lies in the fact that it can be used in 
solving nonsymmetric linear systems. If the given linear 
system is positive definite, then it is recommended to use 


the well-known conjugate gradient method. 


Markov Chains 

Stochastic Processes describe phenomena such as the 
weather of a city on a given day, which can fall into one 
of several states .For example, if we assumed that the 
weather could only be cloudy, partly cloudy, or sunny, then 
it would be a point of interest to know the probability of 
having three consecutive sunny days. In another instance, 
we would be interested in knowing the probability of having 
one cloudy day followed by a sunny day. The preceding 
scenario is an example of a stochastic process. In general, 
stochastic processes can be described by{X(t),t © T}, where 
X(t) wl fe family of random variables indexed by a parameter 
t. The set of values that X(t) can take on is called the 
sate space. If those values were continuous, as in the 


case of the level of water in a dam, then the stochastic 


process would be called a continuous state stochastic 
process. Otherwise, the state would be considered 
a discrete state stochastic process. If the parameter t was 
continuous then the process would be called a continuous 
time stochastic process. Otherwise, the process would be 
called discrete time stochastic process. In the context of 
stochastic processes, ie three different weather 
conditions form what is called the state space. As such, 
the weather condition on a given day may be described by a 
discrete random variable that takes on three values. This 
set of random variables form a stochastic process. As in 
many cases the random variables are indexed by the time T. 
In the above example, T would take only discrete values 
representing the days of the week. As such, the stochastic 
process in hand can be described by{X(t), t Edays of the 
week}, where X(t) represents the weather condition on a 
given day. Then, for example, T would have the form 
T={t :O0< t < +c}. 

A stochastic process that does not change when an 
arbitrary shift of time is introduced is called a 
stationary stochastic process. An example of discrete space 


stochastic process could be the number of planes that crash 


in the US during a given year. Examples of continuous state 
Space stochastic processes are the level of water in a dam 
and the temperature inside a nuclear reactor. A Markov 
process is a stochastic process with a conditional 
probability function satisfying the so-called Markov 
nenidaee described below. A continuous time Markov process 
is a stochastic process indexed by a continuous parameter t 
with a discrete state space. Moreover its conditional 
probability distribution function has the “ Markov 


Property,” i.e., 


Prob {X(t) = x|X(t,) = X-X(t,) = Xppee econ /X(t,) = x,} 
= Prob {X(t) s x [X(t,) = x} for any sequence t,,t,,...,t,,t 
Such that “cy <°t) < iwag St, oS t More-Simply put, che 


current state depends only on the state at time t,. 
Therefore, the system states prior to t, have no effect on 
the current state at time t. The time spent in a state is 
called the sojourn time. In order for the Markov property 
to be satisfied, the time spent in a state should not 
affect the remaining time that will be spent in that state. 
For this to happen the time spent in a given state has to 
follow an exponential distribution if the Markov process at 


hand is continuous. If the Markov process were discrete, 


then the distribution of the time spent in a given state 
would have to be geometric. If the transition from one 
state to another depends on the time the transition occurs, 
then the Markov process is said to be non-homogenous. 
Discrete Markov Chains 

A Markov chain is said to be a discrete time Markov 
chain if the time parameter can take on only discrete 
values. Therefore, the stochastic process at hand would 
consist of the random variables X(0),X(1),X(2),...,X(n). The 
values are the states of the process at time t, where 
5S Oy ives eed }. Assume p,, = prob{x,,, = i| x, = i}. The 
Matrix P= {py} 2S called the transition probability 
matrix. 
The Chapman-Kolmogorov Equations: 
P,; = prob{x,,, = j|x, = i} is a single-step transition 
probability. Now it would be appropriate to find an 
expression for a multiple-step transition probability. 
Proposition 1: p(Af) B/C) = P(A/B{]C).P(B/C) 
Proof: P(A ff) B/C) = P(A () Bf) C) /P(C) 


piAN BNC) PIBNC) 


A/B () C).P(B/C) = 
p ()C).P4 ) (BTC) P(C) 


= p(A [) B/C) 


{n) (n-1) 


Claim: pi) = PP; Where pijis the probability of going 


all k 
from state i to state j in n steps. 


Proof: pi) = prob{x, = j|x, = i} 


It 


prob{X, = j3,X, =k|x, =i}, 0<1<n 


all k 


= § probiX&, = j1%, =k, x, =i). Prob (xX, =k 1X =i}. 


all k 
(By Proposition 1) 


By the Markov Property we get 


py prob{X, = j | X, =k}. Prob {X, = k|X, =i} 
all k 

Now ES = Py i , for 0 <l< n. 

In matrix notation, we have P!” = pYpin), 


In particular, pin) . pilipin - 1) pn 

Definition: A state is said to be recurrent if the 
probability that it will occur again is 1.If the 
probability that a state will not happen again is positive, 


then the state is said to be transient. Now let ia denote 


the probability that the first return to state } occurs n 


steps after leaving it. Hence, 


ie (Prop 4K. mS, Ko et pa ey HD hy = ay er AHL, 


i3 


Recursively, Pi)= )f) Bi, n 21. Also let £, be the 
lel 
probability that the system returns to state j at some 


time, so thatf£.= | When f..is equal to 1 then state j is 
B] 3 J 


nel 
recurrent. In other words, eventually the system will 
definitely return to state j. Otherwise, the state is said 


to be transient. 


Definition: The mean recurrence time, M,,= > nf’ s/t Mas 


nel 
finite then state j is said to be a positive recurrent 


state. If, however, M,,is infinite, then state j is said to 


be a fil ioneruament state. If the Markov chain in hand is 
Sores thes none of its states can be null-recurrent. Then, 
the states are either positive recurrent or transient. 
Moreover, there exists at least one positive recurrent 
state. 

Definition: A state j is said to be periodic with period k 
if when leaving j a return would require a multiple of p 


steps. As such p is the greatest common divisor of the 
integers n such that Bj)> 0. If p is equal to 1 then the 


state is said to be aperiodic. 


1 
- 0 0 0 
4 
1 1 1 1\] 0 2 0 0 1 1 1 1 
Thus, oD, *= Ae. fe fem, | Fea 6 Pa. eo! “hea, Ce oe 
6 3 3 6 1 24 18 12 6 
0 0 — 0 
4 
0 00 1 
-pD-" 
= = = (,12,.16,.24,.48). Note that mQ = 0. 
loos", 


A Queuing model is a common example of a CTMC. For 
instance, we could test the effect of adding an additional 
CPU to a computer network. When the size of the state space 
is small, the calculation of the steady state probabilities 
is simple. However, in many models the state space is age 
and hence efficient numerical techniques would be needed to 
solve such Markov chains. The following is a queueing model 
as another example of a continuous time Markov chain. 
Example: 

Suppose we have two stations in tandem. In other words the 
customer is served by the first station before being served 
by the second station. We need to make the following 
assumptions: 

The arrival of customers is a Poisson process with 
parameter A. There is only one server at each station. 


Customers are served with an exponentially distributed 


iS 


service time whose parameter is p. Scheduling is done ona 


first-come-first-served basis. Customers are treated the 
same. Thus, we have one class of eeceneee:, Both queues 
have infinite capacities. The states are the ordered pairs 
(i,j), where i and j are the number of customers waiting in 
the two queues. Queuing models are not the only field where 
Markov chains are useful. Computer networks, computer 
design, and biology are also major fields in which Markov 
chains could be used to measure the efficiency of their 
models. These models have developed in a manner that gave 
rise to a large increase in their state space. Solving 
linear equations is a very important branch of numerical 
analysis. The characteristics of Markov chains narrow down 
the aes of the methods used to solve them numerically. 
One characteristic, for example, is the fact that the 
dominant eigenvalue of a stochastic matrix is the unity. 
This makes the problem of finding the steady state 
distribution of an ergodic Markov chain equivalent to 
finding the eigenvector that corresponds to the unit 
eigenvalue, as we will see ee In this paper, the focus 
will be on demonstrating some characteristics of Markov 


chains like the one mentioned above and how they lead us to 


16 


efficient numerical methods to solve them. Three numerical 
methods are discussed: direct methods, iterative methods 
and projection methods. It is worth mentioning that there 
are other numerical techniques tailored to solving periodic 
Markov chains and finding transient states. Those 
techniques could be a good material for further study and 


could be found in a textbook like [1]. 
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CHAPTER THREE 


ITERATIVE METHODS 


Iterative methods are commonly used when the linear 
system in hand is sparse. These systems usually arise when 
solving differential equations. They Sieg arise in 
structural analysis where the forces exerted on a truss, 
for example, are unknowns. Usually, the problem set up is 
such that only few forces are involved at each joint. 

Hence, only few unknowns are involved in each Scuation, 
which translates into a sparse coefficient matrix. The same 
situation occurs when trying to solve discrete or 


continuous Markov chains where a lot of the transition 


probabilities or transition rates are zeros. 


The General Problem 


The goal again is to solve Ax = b, where A is an 


n xX n matrix, x€ R®. We are going to use the general fixed 


point scheme to solve this system. A typical iteration has 


(k+1) 


the form x = ox"),k EN,@ is a mapping from R® to a. 


Ax = b can be written as x = (I-A) x + b. Let C= I-A. 
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yY(x) = Cx +b is called the iteration function. If € isa 


fixed point of x = (x), i.e., if &— = o(&), then we have 


(RAD) E= © fx!) eas (E) = (as = é) = c* (x _ é) : 


(k+1) 


If x’ « &, then the sequence x = o(x*) converges to —& when 


lim,..c” =0. This occurs if and only if the spectral radius 


of C is strictly less than 1. I will prove only one 
direction of the above result. 


Proof: Assume the contrary, i.e., let p(C) = 1. Therefore 
there exists an eigenvalue A with |A| = 1 and a vector x # 0 


such that Cx = Ax. Thus, C*x = A‘x. 


But Lime * 0. Hence, {C*} can not be a null sequence. 


The Power Method 

The power method is an example of an iterative method. 
In many situations we are interested in finding the 
elgenvalue with largest or smallest absolute value and 
their corresponding eigenvectors. The power method is 
useful in achieving this goal. Actually, the power method 
can be used to find all eigenvalues and eigenvectors of a 
diagonalizable matrix as can be seen in [2]. Again when it 
comes to applying the power method to mP = P, where P is a 


transition matrix, we know that the dominant eigenvalue is 


29 


one. But the power method is sometimes applied to matrices 
other than P, where the dominant eigenvalue is not one as 


is the case with the Jacobi and Gauss Seidel methods. 


Let A be a diagonalizable nx nmatrix. Let z’ © R” 


Now let’s investigate the iteration z” = Az*",k =1,2,.. 
z= AZ*Y => 2 = ate. Let |A,| 2 |A,| =... = fA,| be the 
eigenvalues of A. Moreover, let (scp, alea ged be the set of 


corresponding eigenvectors. Now we can write 


0 

2) = ax’ + Ox? +..... a,x". Therefore, 

(x) n 

Ze ea PO POG Ie” AE OG Ee oc Ox} 

= oA‘ x’ + o Ax? +....5: a, Ax” 
(kK) kel Kk 2 kon ‘ k _ yy kak 

Hence, 22°) Ou,  - Oghs Koh ew aes ON & ~ Since Ax, = AAs 

A) | Ay 
= A,*[a,x’ +a,)/—] x? +..... a, |=) x7] 

AS aby, 


Now if Z is selected so that a, = 0, then we have the 


following: 
Tf |A,|>|A,|, then tip 56 
71 ay" pene a 1 
(Note that when A, = 1, the above equation becomes 


lim z = a,x’) 


k- 00 
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Therefore, lim}. = 
k 


n 
(k) 
_ Z,, 1 * 
So, iim x: aOR gv Sole pamees yn (*) 
1 
(k=1) 
li 2, 1 kk 
Lim ca Ok ew ee ep eae ane GE) 
aE 
Dividing (*) by (**) yields 
(k) 
i ne ee 1 (k-1) 
ere = 1,%, # 0,z, z= 0 
Lv 
(k) 
lim = Avex, # 0,2" = 0 
k> ow Z, 
ok) 
k v (k-1) ‘ (kK) 
Let q,= py 12 « 0. Then, lim q, =A,. 


Again the focus is on solving mP = P, where P is the 
transition matrix. As the spectral radius of P is equal to 


the unity, we can claim that A,=1. Hence, limz = a,n, which 
. km 0 


can be normalized to find on. 

Example: 

‘We will apply the power method to a stochastic matrix A 
with three different initial states. The eigenvalues of A 


are Aj = 1,A,, = -0.25 + 0.59791 with corresponding 2-norm of 
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‘ : oN ; 
1, 0.65, respectively. So, i = 0.65. The approximate 
2 2 


eigenvector is accurate to 4 decimal places after 25 


iterations because 0.65% « 2x 10°. The MATLAB code for the 
power method, which is given below, is followed by the 
output. z0 is the initial vector, and z is the approximate 
elgenvector that corresponds to the dominant eigenvalue of 
the given transition matrix A. 

function z= PM (A,z0) 

global A 

global z0 


global z 


Z=2*A; 
end 
0 8.2 
A={0 .1 .9 
6 0 4 
Input 
Z = [1, 0, 0] ° 
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Output: 

C= [.2813, .2499, .4688]' 

The same vector z is obtained if the power method is 
applied to z =[0,1,0]'or z = [0,0,1]' 

Now let us look at some of the eueke donated tothe 
establishment of three other common methods, the Jacobi, 


Gauss Seidel, and SOR methods. 


Theoretical Background 
Let Ax = b. Assume that A= M-N where M is non-singular. 


M-N is’ called a splitting of A. Thus, (M-N) x =b 


Mx = Nx + b =x M'Nx + M’bwhich gives thus the iterative 
procedure: x'**) = M°Nx 4 M“b. 

Let H = M'‘N=> x**) 2 Hx 4 c,where c = M“b. 

H is called the iteration matrix .The iteration matrix is 


the one that characterizes different iterative techniques 


such as the method of Jacobi and Gauss Seidel. 


The Jacobi Method 

“The Method of Jacobi can be applied to a non-homogenous 
system but this would result in a slow rate of convergence. 
Usually, faster convergence can be obtained when applying 


the method to homogenous systems. Therefore the goal is to 


29 


solve mQ=0 or equivalently Q’n™” = 0, where Q is the 
infinitesimal generator matrix. For simplicity, let x = vei 
and Q'= D — (L+U), where D is a diagonal matrix whose 
entries are the entries are those of the diagonal of 0" «. 1 
and U are strictly lower and upper triangular matrices 


respectively. To match the method with the general 


splitting technique explained.earlier, we let M = D, 
N = L+U. Thus, the iteration matrix H,; is equal to M‘*Nor 
equivalently H, ve equal to D(L+U). We note that D is non- 


singular since d,, = Ofor all i. 


This leads to the iterative scheme x"*” = H,xor 
x oe DL +U)x™ Let s,, = 1, +u,. Clearly, 5s, = 0, 
a a 1 | 
for all i. Let p,, = ¥ tu s,,Therefore, p,,; = a 8 since 
k=l ii 


c, = 0 if i # k,where [c,,] = pD“. Thus, 


1 
(k+1) (k) 
x aan [yo + Uy5) X, | 


The Gauss Seidel’s method is similar to the Jacobi’s 


method. The difference is that the iteration matrix of the 


method of Gauss Seidel is (D-L)’U. This is equivalent to 


(k+1) 
i ’ 


using x 


where i <4, to compute x'**? , 


BI 
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Example: 

We seek to find the steady state solution of the 
reliability model whose infinitesimal generator isa Ox 9 
matrix adapted from [1]. To this end, the Gauss Seidel 
method as used. The input of the algorithm consists of an 
approximate solution, the maximum number of iterations, and 
the infinitesimal generator Q. The output is the 
approximate solution obtained by the last iteration and the 
associated norm of the residual vector. The output also 
includes the iteration matrix B and its eigenvalues. Note 
that éhe subdominant eigenvalue is 0.9827, very close in 
modulus to the dominant eigenvalue. This translates into 
slow rate of convergence. Setting itmax, the maximum number 
of iterations, to 800 we obtains an accuracy of six decimal 
places. If we set itmax = 10, the accuracy gets. extremely 


poor as shown below. 


-60.4 60 0 25 0 0 0 0 

60 -90.4 120 0 25 0 0 0 

0 30 -120.4 0 0 25 0 0 0 

4 0 0 -~60.7 60 0 al 0 0 
A=|0 4 0 60 -90.7 120 0 i 0 

0 0 me 0 30 S1DOr 7: 26 0 1 

0 0 2 0 0 -61 60 0 

0 0 2 0 60 -91 120 

0 0 0 2 0 30 -121 
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b=[0,0,0,0,0,0,0,0,0]',and itmax=10 
Output: 


x=[.2049, .2052, .0514, .2255, 
.2263,.0568, .1675, .1665, .0414] ' 


res=[(14.5884,0.3779,0.3302, .3189, 
.3082,.2979, .2880, .2784, .2693,.2605]' 


Block Iterative’ Methods 

We will illustrate how block ieoratige techniques can 
be used to find the steady state probability of an 
infinitesimal generator matrix. We will also show, ‘by an 
example that block iterative techniques are generally 
faster than their iterative counterparts. 
First we partition the vector m into N subvectors and the 
maenis Q into N’to obtain block representation of nO = 0 as — 


follows. 


Ov, Que . . ° Quan 


32 


Then, we split the matrix Q’by the block splitting: 
Q" = Dy - (Ly + Uy). Here D, is a block diagonal matrix while 
L, and U,yare strictly lower and upper block triangular 


matrices respectively. 


De UO te ones VD 0 OO ee ow 0 
O- “Dye Lisp <0 0 
Thus, Dy= . . , : , y= ; ° ; ; , and’ 

0 0 « Dy. Dien. aes 9) 

0 UL, Uy 

0 0 Uae 

U, = 
0 O 0 


Similar to the Gauss Seidel method the block Gauss Seidel 
method is given by the iteration: (De Dig) me Uk The 
feiigaane example shows the advantage of the block Gauss 
Seidel method over the Gauss Seidel method with respect to 
the speed of convergence. The transition matrix was 
obtained from Courtois {1]. The Block Gauss-Seidel MATLAB 
Function is given a Bee a es matrix, and a vector ni 
which describes the length of each block. I have chosen ni 


= [3,2,3]. Another two input parameters are the number of 
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outer iterations labeled itmaxl and the number of inner 
operations itmax2. itmaxl is simply the number of times the 
program acts on the entire set of blocks, while itmax2 is 
the number of times the Gauss-Seidel algorithm is called to 
solve an individual block. The program can be modified in 
such a way that would enable it to solve the individual 
blocks using a direct method rather than the Gauss-Seidel 
procedure. The output of the program, which consists of the 
solution vector x and the residual vector obtained after 10 


iterations, along with the Courtois matrix are given below. 


Recs, pei aul 0 -0005 0 .00003 0 


0 65 .8 .0004 OO .00005 0 .00005 
.149 .249 .0996 0  .0004 0 .00003 0 
‘ -.0009 0 .0003 .7. ~)=— 399 0 .00004 0 
- 0 .0009 0 .2995 .6 .00005 0 .00005 
.00005 .00005 0 0  .0001. .6 ar .1999 
0 0 .0001 .0001 0 ~- .2499 8 225 
.00005 .00005 0 0 0 15 .09990 .55 } 
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.08928265275448\ 


s09275763/505.11 
-04048831201636 
#58 53319081979 
.11893820690415 
.12038548110608 
~27779525244933 
-10181926644470 


’ 


Residue 
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.00000000000303x10° 


.CHAPTER FOUR 


PROJECTION METHODS 


Projection methods are relatively recent techniques 
aimed’ at solving systems of linear equations. Again ewan 
be dealing with large and sparse systems. A typical 
projection method would attempt to find an approximate 


solution in a subspace of a much lower dimension than that 


of R". The dimension of that subspace will be denoted by m. 
We will need another subspace to introduce constraints that 
are neeasuayy to ee an approximate solution. Once an 
approximate solution is found, the process is repeated 
again until a sufficiently small residual is obtained. The 
- following is a general outline of the method. 

Let Ax* = £. (i) 

Let K,and L,be two subspaces of R". 


Two common choices for Lj are L, = K,and L, = AK,. 
LSet. 1 Sy Nope eas ,v,} be a basis of 
Ka, (amy “W = {Wap Wy fem es ,w,}be a basis of L,. 


m 


Let x, be an initial approximation to the solution of (i). 


Also, let x be the new approximation obtained by a single 


projection step such that x = Ro, tozp 2 SUK. 
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Now let us impose the so-called orthogonality condition 
<f-Ax, v > = 0 Vve L, -Also, assume that z = V,y, where 
V,is the matrix whose columns are the vectors of U, and the 
initial residue be x, = f — Ax, .If x were to be the solution 
of (1), then the following would be true: 

Bo Xi 2) me OE 

Ax, + Az =f. 


Thus, Az = f - Ax,. 


Because <f-Ax, v > = 0, 
it follows that <f-A(x, +z), v > = 0. 
Hence, <m% - Az,v > = 0, and 


W.(r, -, AV,y) = 0. 

Assuming that WAV, is nonsingular,we have: 

wavy? Wir = y, and x= x) + VAY = X +[WIAV, J Win 

The above is called Petrov-Galerkin supeod mae eas 

How do we choose L,? A good choice is the subspace defined 


by Li: = AK,. This can be justified by the following theorem. 


Theorem 1: L, = AK,,x is the approximate solution provided 
by the Petrov-Galerkin approximation <it minimizes the. 


Fuclidean norm of the residual vector f- Ax, for all 


37 


x Gx, + Ky. Note that it can be shown that -x,.+K, is a 
subspace. It is called an affine subspace.. 
Proof: We will prove only one direction. 


Let x* be any vector in x, + K,. Therefore, 


if - Ax *|? =e - Al (x * -x) + x] 


= (f - Ax, £ - Ax) -— 2£ - Ax, A(x * —x)) + (A(x * -x), A(x * -x)) 
By the orthogonality condition, the middle term in the 
above expression is zero. 


2 


Thus, | - Ax *|/=(€ - Ax’, £ - Ax’) + (Ax — x"), Ax - x’) )2[f - ax’ 


A lot of the projection methods make use of the so-called 
Krylov subspace defined by: 

K, (A, v) = span{v, A’v, A°v,...... ,A™'v}for some v € R®. 

These methods require obtaining an orthonormal basis for 


the Krylov subspace. One classical method is the known 


Gram-Schmidt Process. 


Gram-Schmidt Orthogonalization Procedure 


The tnput, 26 Ka (tp op hes peel pre SR: 


AEE: Vey. = |x, |, ‘and q, = ea 


2. For j = 2,3,....,m do 
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4a, = Gikap slp Qe ge SL 


4-1 
Se > yh 


1 


ry 
KO 
ll 


53 lal and q, = q; / G4 
The input of the algorithm is an (n xm) matrix X of rank m 
less than or equal to m. The columns of X represent the 
vectors that span a given subspace. Basically, the 
algorithm factors thé matrix X into QR. Q is an 
nx morthogonal matrix whose columns form the basis of the 
given subspace. R is an nx nupper triangular matrix. The 


columns of X can be written as linear combinations of the 


columns of Q. 


Examples: 
% = Go + 42d) 
%3 = G3 + 439, + %393 


The last line of step 2 ensures that the q,‘s are 


normalized. Now let us prove that Q is orthogonal. 


j-1 


q, = x, - > (aix,) ds 


Lei 


FOr dll kus jel, we fave 


jr-1 
Ged = x, — D aelaxs ay 


iel 
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jo1 


ax, — S ayas(arx,) 


i=l 


= OX, a OX; = 0 

Gram-Schmidt Orthogonalization Procedure was modified to 
yield.a more numerically stable version than the one 
menevened above. The new version cas referred to as 
Modified Gram-Schmidt Orthogonalization ppoesaane. When the 
Modified Gram-Schmidt Orthogonalization Procedure is 
applied to a Krylov subspace the method is referred to as 
Arnoldi’s process. The following eee MATLAB program used 
to find an orthonormal basis for the Krylov subspace sear 
Arnoldi’s process. The input is a WonsearerWeetes 
v, > |\v,|, = 1 and a matrix A. Clearly, A is-nxn and: v,is 
nx il. The output is an orthonormal basis for the Krylov 
Subspace K, = spantv,, Av,,..., A™“v,). The matrix whose 
columns form this basis will be called V,. 
function [Hbar,v] =modgs(A,v,m) . 
[n,n] = size (A); 
for j= i:m, 

vjev (Lin, 4): 

W=A*v}i- 


for i=l:j, 


A0 


vi=v(lin,i); 

Hbar (i,j) =vi'*w;_ 

w=w-Hbar (i,j) *vi- 
end 
Epan ant nom: 
v=[v,w/Hbar (j+1,4)]; 


end 


For example, if the input parameters were: 


A= 
ch 2 3 
-1 4 5 
g 7 4 
m= 3 
v = [.6,.8,0]', 


; 026000. 0.0235 “<2 7997 =0.56593 
The output would be:v = |0.8000 -0.0176 0.5997 0.5653]. 
0.0000 0.9996 0.0294 0.6007}: - 


Remark: The first three columns of v form the desired 

orthonormal basis for the Krylov subspace. The eigenvalues 
of A can be approximated by those of H,, ree H.=VIAV, -H, 
represents the restriction of the linear transformation of 


A to the subspace K,with respect to the orthonormal basis 
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V, obtained by Arnoldi’s process. Let A,be the eigenvalues 
of H, i= 1, 2,---,M™, and lety,be the corresponding 
eigenvectors. In other words, H,y, = A,¥,,;1 = 1,2,...,m. As m 
gets larger, A,becomes closer to an eigenvalue of A and V,y, 
gets closer to the corresponding eigenvector of A. 


Claim: v' (A(V,y,) - A,(V,y,))= 0, V v EG K,,i = 1,2,...,m. 


Proof: v' (A(V,y,) -— A 


i 


= vT™(A(Vyy,) — (VaAyy,) ) 


1 


=v' (A(V,y,) - (V,V,AV,y,)) since VIAV.y, = Avy,- 
Now note that V,V, =I. This is true because V,is a set of 
orthonormal vectors. 


Thus,’ v’ (A(V,y,) — (V,V,AV,y;) )=v" (A(V,y,) - AV,y,)= 0.The result we 


just proved is called Galerkin condition. 


The Full Orthogonalization Method 
The full orthogonalization method approximates the 
solution of the linear system Ax = b. The algorithm starts 


with an initial residual vector m, where H=b-Ax,. Let v, 


xc : 
0 2 3 m-1 

il, , and K, = span{v,,AvV,,AVy,---+-- ce Vs 
Ol2 
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An approximate solution can be obtained by ” = x, + z,with 
z, © K,. In the FOM the approximated solution is forced to 
satisfy the Galerkin condition: v'x, = 0, for all vEK,. 

Let V, = {V,,Vj,-.--+,V,}be the orthonormal basis for the 
sbevesmenuioned Krylov subspace. Thus, z, © K,can be written 
as a linear combination of Vy, Vyy---+ Vas say 24 = VV 
Thererore, x. -= ix). + V.¥, « Since Ax, = b=, Ax, = DP =x, and 


roa then we have Ax, = Ax, + AV,y, - 
Ola 


A(xX,-X)) = AVY, 


= AVY. 


Ky 
I 


n = % + AVY, 


K 
I 


4s Sa. ae 
Hence, Var, =Va (ml, vi + AVay.) 
=|], Vavi + VeAVpYn 
=[ol, ee. + Haar 
where e,is the first column of I,. 
chus, y, = #2 |x, e- 


Note that in the above statement Viv,was replaced by e,. 


This can be easily seen by the following argument: 
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N9 
vi 0 

Viv.= 3 |e v= (since V,is a set of orthonormal vectors 
ia 0 


The Generalized Minimal 
Residual Method 


The generalized minimal residual method is similar to 
the FOM method. The only difference between the two methods 


is that when the GMRES method is used,z,is chosen in such a 
way to minimize lb - Ax, ||, - It can also be shown that the 


GMRES method is a projection method with the subspace L as 
aac earlier equal to AK, ‘where K is the Krylov 
subspace. The GMRES is used when the coefficient matrix A 
is nonsymmetric positive definite. If A is symmetric and 
positive definite, then the Conjugate Gradient method, a 
projection method, yields the beat solution. 


We note that at the end of each iteration of Arnoldi’s 


process we get:h,,44V5.. = Av, — Rhyv, - hyv, ~.-.. — hy,;v,; which 
is equivalent toAv, = v,h,, + voh, +... + Vyhy, + Vyii85,1,5,-Let 
Kava= [Var Vorss ee Vagal and Tet V,,, be the matrix whose 
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columns consist of the vectors of Rot Then, AV, = Visi Hn ¢ 


where H, is an (m+tl)xm matrix whose last row is 


AO bred ¢ Peano Ces et 


m+1,m 


] and whose other rows: are identical to 
those of H,.Again we let x, = x + Z,, where z,is an element 
of the Krylov subspace chosen in a way to minimize 

lb = Ax,|l, « Let. 2. VaY s 


Thus, fp -_ Ax, (|) = lp — Al + VaY Il, = lx G AV, y||,=(Bv.Avyl, ° 


Since AV, = V,,,H,, then|v,_Av,yl, = 


m+1 m 


VeniBrs. Hy ¥ . Therefore, 
2 
minimizing Ib - Ax, ||, is equivalent to minimizing 


is a constant. Hence the 


- However a 


Vas BV, Hy Y) 
2 


2 


minimization problem is reduced to minimizing only 


This least squarés problem can be solved using 


lew, H,, y) 


2 
OR factorization. See, for example, [1]. 

Theorem: If m = the degree of the minimal polynomial of A, 
then the exact solution of (I) belongs to K,(A,b). 


The degree of the minimal polynomial of a matrix A is the 


smallest m that makes ee a rere a linearly dependent. 
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5 -6 -6 
For example, if A =j)-1:.4 2] it can be shown that 
3 -6 -4 


{t, A, a°} is the smallest set whose elements are linearly 


dependent. Hence, the degree of the minimal polynomial of A 
is 2. In order to find it we would have to solve 


A? +aA+bI = 0. It is not hard to show that 


a= -3 and b= -2. 


Example 
5 -6 -6 ‘i 
Let A= /]-1 4 2], b=|2].The eigenvalues of A are 
3-6 -4 3 


2,2,and 1.The minimal polynomial of A is q(t) = (t -2)(t-1) 


as illustrated above. The exact solution of Ax = b is 


14 i\=25 
-3.5], and K,(A,b) = 2|,| 13 
15 3) \-21. 
14 14 1 —25 
Now |-3.5/GK,(A,b), since]-3.5[= 1.5]}2|-0.5 | 13 


15 15 3 237 
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CHAPTER FIVE 


CONCLUSION 


To conclude this study I will compare and analyze the 
different: numerical techniques discussed a the earlier 
cpustees. ae Courtois tei will be used to compare some 
OF ‘the numerical methods discussed in this study. First, 
the direct methods such as Gaussian elimination and LDU 
decomposition can be used to solve relatively small nicest 
systems of equations. However when it comes to solving 
large sparse matrices like those that arise in solving 
Markov chains, direct methods display a major disadvantage. 
Remember that direct methods are based on creating zeros at 
certain locations of the coefficient matrix. In doing so, 
unfortunately, non-zero entries are created in other 
locations that originally contained zeros. This process is” 
called fill-in -and. it destroys the sparse structure of the 
original matrix. Consequently, large computer storage joule 
be needed to store such a matrix with lots of non-zero 
entries. Nevertheless, direct methods have the advantage of 
enabling us to determine the number of steps required to 


solve a linear system. 
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Second, iterative methods do not suffer from having to 
have large computer storage as direct mecieds do. The 
reason for that follows from the fact that iterative 
methods do not alter the coefficient matrix and hence no 
fill-ins are created. Unfortunately, when iterative methods 
are used, we cannot predict the number of iterations 
required to find a solution with a preset error. There are 
two factors that govern the convergence rate. The first one 
is the ratio of the dominant eigenvalue to the subdominant 
eigenvalue. If this ratio is close to one, the rate of 
convergence becomes very slow. An example of a matrix that 
possesses such a behavior is ee Courtois ee, Actually, 
the Courtois matrix belongs to the class of nearly 
se asnsssabie mares usually abbreviated by NCD. If 
iterative methods were to be applied to a an NCD matrix, 
then preconditioning techniques would be required to speed 
up the rate of convergence. The second factor is the 
initial approximation. In fact, this factor is an advantage 
when using iterative methods. When an experiment is 
performed several times, usually the parameters are 
slightly changed and hence we expect the solutions to be 


approximately the same. So the calculated solution of a 
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given experiment, for example, can be used as an initial 

~ solution to solve the system shesinea by a subsequent 
experiment. Projection methods such as Arnoldi’s method and 
GMRES are efficient to tackle NCD Markov chains. However a 
storage problem arises when the dimension, m, of the 
subspace used is eee In this case we use a value of m. 
that would be suitable for the available computer memory. 
If the resulting approximate solution is not satisfactory, 
then we use it as an initial seneouinmeien: These methods 
are referred to as iterative Arnoldi’s method and iterative 
GMRES. Now I will present results of some MATLAB’s 
programs to demonstrate some of the facts discussed above. 
Table 1 shows the eigenvalues of the transpose of the 
Courtois matrix in the first column. In the second column, 
we have the eigenvalues of the iteration ee found when 


applying the method of Jacobi to the Courtois matrix. 
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Table 1: Comparison of Figenvalues 
of Iteration Matrices 


1.0 


-99980000000000 
-99849479689134 
e PO00 2623150850 


.55006663986827 
.40003338516447 
.30071431880876 


- 99938353640096 
s 997908 95882791 
-99579335294684 
.98325797025592 
-50277001092246 
.50277001092246 


-41640735625114 


~14953537224133 


Note that in both columns the ratio of the dominant 
eigenvalue to the subdominant eigenvalue is close to one. 
Actually, this ratio in the first column is larger than its 
counterpart in the second column. Thus we expect the Jacobi 
method to converge faster than the power method. 

Techniques such as preconditioning can be used to tackle 
the problem that arises when this ratio is close to one. 


These techniques can be found in [1]. 
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